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Sound spectrography, as a powerful tool in acoustics in genaral, has also found 
a wide field of application in ornithoacoustics and given rise to a great body of 
literature all over the world. In this paper we wish to advance the opinion that 
the problem of its methodological adequacy in certain applications is ripe 
enough to be re-examined somewhat more deeply and confronted with another 
methodological approach as far as the study of intonation structure of bird 
vocalizations is concerned. 


I 


In linear systems the frequency bandwidth Af of an oscillatory process is 
inversely proportional to its duration At. This may be expressed in the form of 
the well known uncertainty principle 


AfAt = u 


where the dimensionless constant u depends on how the bandwidth and dura- 
tion are defined (in nontrivial cases) (KHARKEVICH, 1962; GOLDMAN, 1948; 
KUPFMULLER, 1949; WINCKEL, 1960; PIMonov, 1962; STEWART, 1931; Kock, 
1935; GABOR, 1946, 1950; CORLISS, 1962; FILIP, 1970). Applied to acoustics, 
and verbally interpreted, this means that the two variables, bandwidth and 
duration, are not mutually independent, that is, tones of limited duration cannot 
be expected to have an infinitely narrow (,,line”) frequency spectrum: when the 
duration of a tone decreases its frequency uncertainty increases, and vice 
versa, if the frequency of a tone is to be determined more definitely the tone 
should be longer. For a special case of a gated sinusoid the constant u equals 2. 


*Dr. Péter Szöke, ELTE Ällatrendszertani és Okológiai Tanszék (Zoosystematical and Ecological Institute of the 
Loránd Eötvös University), Budapest, VIII. Puskin u. 3. — Dr. Miroslav Filip, Department of Mes'cology, Come-* 
nius University, Bratislava, Czechoslovakia (11973). 

1 Fundamental ideas were presented to the 15th International Ornithological Congress, The Hague, August 31 
1970 (Szóke — Tarndezy— Filip). Figures of this paper with few exceptions were also projected in the form of slides 
accompanied by corresponding tape recordings, The projeetion was repeated and discussed also on the 622nd Session 
oft he Zoological Section of the Hungarian Biological Society, Budupest, February 5, 1971. 
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Then, for examplo, a tone with a 200-msec duration cannot be said to have its 
„nominal” frequency as a unique value since it follows from the uncertainty 
principle that its frequency-domain representation hax a bandwidth Af= 10 Hz. 

All conventional spectrum analysis methods are linear transío ‘ms. This 
implies that the analysis bandwidth cannot be made arbitrarily narrow in order 
to attain a hign frequency resolution or, in effect, high accuracy if frequency 
measurement (see above, and: SOANES, 1952; BLACKMAN & Tuxer, 1259; 

TOPSKIJ, 1962; ‘rove & Pers, 1964; KRIKSUNOV, 1965). In other words, the 
frequency resolution of a fiiter-type analyzer is again inverscly proportional 
to its time resolution. The more rapid successions of tones are to Do analyzed, 
tha wider the analyzing filter bandwidth must be. Consequently, such an analy- 
zer is well suited to measurements where it is the spectrum density (or SPL den- 
sity) that is of primary interest. If, however, tho goals of investigation call for 
measuring the varying instantaneous frequency of consecutive tones as a func- 
tion of time, then conventional spectrography (o. g. the sonogram) is fa: from 

zing an optimum method. 

The most widely known snoctrographic method, represented by the “Sona- 
granh”, has beon designed specifically for analysis of formant structure in human 
spöech.? It is quite natural that the instrument has found its meny applications 
in ornithoacoustics, too. For inlonation studios, however, different methods hac 
been introduced in phonetics and m musicology as carly as in 1937 (GRÜTZ- 
MACHER Y LOTTERMOSER, 1937, 1938, 1940; OBATA & Koravasıar, 1937, 1933, 


No doubt, there exist acoustic phenomena and paramoters that require the 
use of a spectrum anelyzor, e. g. tho Bona reaph, in bioacvstics. On the other 
hand — and this motivated our paper — birl vocolizations usually havo a form 
of a soquence of more or less distinct tones with more or 1233 definite frequency, 
inciuding patterns with continuously changing froquency too.” If the acoustic 
structure of bird vocalization is of this nature, and if this is wnat ons wishes 
to study, then a method of instantaneous frequency recording ist he only adequate 
approach.4 The application of the Sonagraph must be viewed just as a rovtine 
use of equipment at the scientist’s disposal, but is nol an optimum solution 
from the methodological point of view.5 

While spoctrography is a frequency-domain method based on the Fourier 
transform of input time function, the instantaneous frequency (period) measu- 
rement is based on the time-domain definition of periodicity (IXNESER, 1948; 
UNGEHEUER, 1963; RIGHINI, 1964; Tove & Perc, 1964; SlociviNNEY, 1965; 
Kory, 1958; FiLip, 1989, 19705; LEON & MARTIN, 1970). The (fundamental) 
period of a quasi-periodic input signal is then defined by two consecutive posi- 

2 The “sonograrh” and “visible specch" techriqees began ti.cir large Tibliorrapby with a paper by R. K. Potter: 
1945, Science, 102: 465 — 470, and particularly by a series of papers, 1946, in J, Lt oust, Soc. Amer, 18: 1-75, and the 
hook Potter, Kopp, Green, Visible speech, New York, Van Nostrand, 1947. An upânted Linlisgrast.y would excced 
the scope of (Lis paper. 


3 Sequences of rapid short sounds or crowded micro-paiterns often treated in the literature or shown on the 8910- 
grams as “impure” or *otscure" or “smenred” sounds, i. e. practically es “noises”, may prove to be relatively clear 
intonations well representaite griplicaliy (sometimes also musically) as well as with an instantaneous-frequency 
recorder, if slowed ¢ own sufficientiy and prolcssionally. 

4 To the best of our knowiedie, the application of instantaneous frequency recording to bird vocalization, has 
been described in Fish, 1955; Tove-- Normun —Isulsson —Czekajeswski, 1966; S20ke —Guna— Filip, 1969; Hjorth, 1970; 
Sz76ke— Tarnoczy—: Filip, 1010. 

5 The frequent statement that tLe Sonagraph performs the *frequency/time" analysis (Thorpe, 1961; Marler, 
1269; Thirleke, 1966; Hinde, 1969; Horror & Halafoff, 1969 etc.) is to be understood in the sense that the result of 
the analysis is plotted in a coordinate system with a frequency axis and a time axis, The result of the analysis self, i.e. 
the sonogram, however, is, strictly speaking, a “spectruni-density/time” graphie representation, not a *[frequeucy/ 
time” one. 
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tive-going (or negative-going) zerocrossings of a signal derived from the input 
signal in such a way that their steady-state frequencies would be equal. The 
instantaneous frequency is thus the reciprocal of the instantaneous period and 
can be displayed by any kind of time-to-voltage converter, in this case perhaps 
more appropriately termed period-to-voltage converter (FILIP, 19700). 

In contrast to the FOURIER-type analysis, this kind of processing is no longer 
a linear one and, consequently, is not subjected to the uncertainty principle 
in the above sense, the accuracy of frequency display being dependent only 
upon the accuracy of the instrument itself, and upon the signal-to-noise ratio, 
already for the second period after a transient. Only the “first”-period indica- 
tion is not valid but this, in fact, is not defined at all as far as real-time proces- 
sing is taken into account. 

The Sonagraph offers the choice of one of the two bandwiths, “wide” and 
“narrow”, usually 300 Hz and 45 Hz. The corresponding time resolution would 
be inversely proportional, as stated before and, if we take the transient time ofa 
filter to be reciprocal of its bandwidth, we have transient times about 3.3 msec 
and 22 msec, respectively. It is obvious that the narrow-band analysis with its 
22-msec buildup and decay time of filter response would be usable only in the 
relatively few cases of sufficiently “steady” signals without rapid tone succes- 
sions, and even the 45-Hz band would represent, for example, about a third- 
ovtave “uncertainty” with respect to a centre frequency of 200 Hz. Moreover, 
the narrow-band analysis with its considerably decreased time resolution could 
hardlv be used in analysis of frequency modulated tones with high rates of 
frequency modulation (MARLER, 1969) which is found often in bird vocaliza- 
tions. 

Thus, the question is not which of the two bandwidths is more appropriate 
in general, but whether sound spectrography is an appropriate method at all. It 
was our aim to show that it is not, if we wish to study the intonation structure 
of bird vocalizations. Then the instantancous frequency recording is the only 
methodologically justified approach to the problem.? 

In instantaneous-frequency recording it is conceptually useful to discern two 
stages of signal processing, namely, the extraction stage (extraction of periodi- 
city information from the complex input signal) and conversion stage (period-to- 
voitage conversion and recording). The instantaneous frequency graphs presen- 
ted in this paper have been obtained with an instrument developed by the 
second author for the Institute of Musicology, Slovak Academy of Sciences, 
Bratislava (1961 — 64). The extraction method which may be termed envelope 
periodicity detection has been described elsewhere (FILIP, 1969), so we will not 
give the details. The conversion method has been modified from that of GRUTZ- 
MACHER and LOTTERMOSER, well known in the literature (GRUTZMACHER & 
LorrERMOSER 1937, 1938, 1940; FILIP, 1970, 19700). 


6 The well known phenomenon of “ringing” is one of the manifestations of limited time resolution, To achieve 
sufficient frequency resolution the selectivity of the filter must Le sufficiently high. Then its impulse response (inverse 
Fourier transform of its transmission function) is not aperiodic and the duration of “ringing” is, roughly speaking, 
proportional to the selectivity. When a very short tone burts is being applied, the filter responds by several periods of 
its resonance frequency thus effectively prolonging the apparent duration of the measured tone. (See also Davis, 
1964, p. 127.) The delayed ringing of tl:e narrow-band filter, being even longer than the sound itself, often makes the 
thin vertical transient lines thick (as for example in Figs OV, 11N). 

? lt has been used by the authors of this paper since 1964 and reported on various occassions including the 14th 
Int. Ornithol. Cong., Oxford 1966, as well as at the same time in an informal meeting and unpublished communica- 
tion, 
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As concerns the relation between objective and subjective methods of repre- 
sentation (on the basis of sound microscopy, see Section 11), it is hoped that 
it is worth mentioning that the periodicity pitch perception in quasiharmonic 
signals such as musical tones and vowels has a much closer analogy in the 
time-domain definition of frequency as implemented in instruments for recor 
ding the instantaneous-frequency graphs (NORDMARK, 1968; FıLır, 19704) than 
in the spectrum-density definition implemented by pure FOURIER-type analy- 
zers. Thus the high correlation between objective and subjective graphs has 
indeed more than just a practical value. 

It should also be pointed out that for the (human) ear, as a nonlinear system, 
the uncertainty principle no longer holds in its original sense, valid for linear 
systems (Liana & CuisTovicH, 1960; CARDOZO, 1962; Sexy, 1962, 1963; 
Scorer, 1963; MAJERNIK, 1964, 1967; KURZE, 1965; GAMBARDELLA & TRAUT- 
TEUR, 1966; CORLISS, 1967; RONKEN, 1971) and thus the time resolution ability 
is much greater than it would be if it were determined by the uncertainty 
principle for linear systems and by the admirably high frequency resolution of 
human auditory system. It is assumed that birds’ frequency resolution is 
comparable to that of man (KNECHT, 1940; SCHWARTZKOPFF, 1949, 1952, 1955; 
GALAMBOS, 1954; THORPE, 19615). There is some evidence, however, that their 
time resolution capabilities are higher than in man (MARLER, 1969) and impor- 
tant experiments have been described which indicate that it is considerably 
higher (Koxisur, 19699). The multiple slowing down of bird vocalization tape 
recordings when studied and notated by car may thus be scen as a compensa- 
tion for the difference in the time resolution properties of human and avian 
auditory system. 

If we again take the reciprocal of frequency bandwidth as the transient 
time of a filter, and consider this time to be representative for the time resolu- 
tion of the analysis, then the time resolution of the instantaneous-frequeney 
measurement (i. e., one cycle) may be shown to be Q times better than that of the 
spectrograph, where Q is the quality factor of the (idealized LC) filter, Q = 
= f,/4f, with f, being the centre frequence of the filter and كرك‎ its half-power- 
point bandwidth. 

Moreover, the frequency records represent the instantaneous frequency in 
the form of a line (or, equivalently, in the form of a boundary between black 
and white areas as in this paper) with practically negligible width, in accordance 
with the time-domain definition of instantaneous frequency as opposed to the 
spectrum-density definition implemented by a spectrum analyzer. Thus the 
frequency resolution of instantaneous-frequency measurement is practically 
equal to the measurement accuracy which is, asstated above, limited only by the 
properties of the equipment and by the signal-to-noise ratio. This is indeed 
negligible if compared to the width of sonogram traces. 


$ Some direct quotations from Thorpe, 1961: “It is everywhere agreed that frequency-analysis or harmonict 
analysis is the essential basis of 'hearing' in at least higher vertebrates — that is to say, the fish, birds and mammals — 
as against hearing by the analysis of amplitude-modulation which predominates in the insects" (120). . “The presen! 
overall picture of the hearing abilities of birds which thus emerges suggests that it is similar to our own in genera- 
range and ability to discriminate pitch. Song birds and parrots certainly approach human abilities...” (127)... “In 
conclusion we can say, wlth Galambos, that the capacity for dealing with tones, as measured by psychological testing, 
is not remarkably dissimilar for fish, birds and men” (128). 

9 “Most units (avian auditory neurones) exhibit near 100 per cent time-locking to a train of clicks when the inter- 
click interval exceeds 1.3 —2.0 ms”... Units can follow click 1epetition rates lower or higher than their best fiequen- 
cies (to whieh the units are most sensitive), although few units can follow on a one-to-one basis repetition ate higer than 
about 1000 clicks pers... “In comparison with songbirds, a specles like the pigeon does not seem to have any rapid 
sequenee of sounds in its vocalizations, yet its auditory neurones can resolve such sounds” (pp. 566 — 567). 
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Besides, our instantaneous-frequency graphs (see Figs 3F, 4۳, 5F, 6F, 15F) 
are also intended to confirm the validity, in its specific sense, of both graphic 
or semigraphic (see e. g. Figs 10G, 7S) and adapted musical notation in portray- 
ing musically structured bird vocalizations, the graphic and semigraphic 
representation being suitable also for those structured non-musically. 


u 


As a result of his developing of sound microscopy (the prerequisite of maxi- 
mum possible adequacy of any subjective representational mode) into a funda- 
mental and consistently applied research method, the first author has realized and 
comprehensively examined the apparent consequences of the inadequacy of 
sound spectrogrpahy in its present-day ornithoacoustical applications, and 
due to the facts revealed, a (1) graphic or (2) semigraphic (five-line staff) repre- 
sentational mode and an (3) adapted musical nolution have been developed by 
him for the purposes of the sufficiently reliable aural (subjective) transcription 
of the intonation (pitch and time) structure of bird vocalizations (similarly, in 
some sense, to that known in ethnomusicology, for example).!? This new method 
of musical representation based on sound microscopy and applicable only for 
musicallv structured vocalizations is, of course, basically different from the 
earlier dilottante and naive attempts of “musical transeription” of natural 
bird sounds, applied even to those structured non-musically. 

As the conventional musical notation is a compund graphic and symbolie 
representation, it is fully satisfactory only in its original application, i. e., 
roughly speaking, to professional (composed) music except “New Music”. In 
ethnomusicology, as well as in the study of “musically” struetured avian 
vocalizations, the transcription calls for certain refinements and additional 
signs to complement the traditional ones, and at the same time for the re- 
examination and clarification of some traditional views and concepts of theoreti- 
cal and practical importance concerning music in genaral and avian musicality 
in particulari! (HARTSHORNE, 1958; THORPE, 1961; THORPE & Lape 19016 
Davis & IRBY, 1964; HALAFOTF, 1968; HiNDE, 1969: Horp, 1970). 

The purely graphic representation, as already mentioned, may also be used to 
represent acoustical phenomena not expressible by conventional, or even adap- 
ted musical notation. In fact, the graphic (and semigraphic) representation is a 
subjective analogy to the objective instantaneous-frequency records and can 
be regarded, in a sense, as if the physical record were subjected to a kind of data 
reduction process carried out by the pitch and time perception mechanisins. 
Thus, from the physical signal represented by the frequency graph, and sub- 


10 “Subjective” is not to be confused with arbitiary” or “biased”. All three subjective (graphic, semigraphie and 
musical) modes of representation applied in this study are justified exclusively on the basis of high (in Passeriformes 
regularly 16 —64-fold, rarely 128-fold) stretch of time (scientifically demanding slowing down the speed of vocaliza- 
tion). 

11 In order to avoid misunderstandings, we have to explain, though with some simplification here, at least two 
basic terms in this paper, As musical are treated bird sound phenomena (intonation structures) based on a tonal system 
(scale) and consisting mostly of tones with “musical pitch invervals” known to us from human music (lirst of all from 
folk music) and, on the physical level, analogue to the harmonic (or possibly quasiharmonic) frequency intervals (rela 
tions of overtoncs in so far they are discriminable and learnable by the avian (and human) hearing. Acoustic plieno- 
mena without “musical” intervals in this sense are concerned as non-musical. In the last analysis, “musicality” in 
birds as well as in man (and even in the pure physical inorganic sphere) is, in the light of the facts, a question of specific 
Pitch (frequency) structure, not of function or meaning. Consequently, the “musical” character of bird vocalizations 
is also independent of their time (rhythmic or non-rhythmic) structure although the process of rhythmlzation (repeti- 
tion) had an important share in the evolution of different forms of “avian music” (and, equally, of the developed non- 
musical forms of bird vocallzation) (Szóke, 1974). 
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jected to the sound microscopy technique, a perceptual pattern results which is 
written down by an experienced scholar in the form of a graphic representation. 
In order to make an immediate auditory (pitch) imagination possible, the 
semigraphic notation combines the features of pure graphic representation 
with the advantages of musical five-line staff (in the case of both non-musical 
and musical structure of vocalization). 

Professionally made subjective transcriptions based on the necessary sound 
microscopy can be regarded as close and practically sufficient approximations 
to the psychoacoustical (perceptual) form of bird vocalizations in their pitch 
and time aspects. The accuracy of the subjective modes of representation (1, 2, 
3) presented here can be still more refined to a reasonable measure although 
even at their present state they give us much more and more precise informa- 
tion about the pitch and time structure of bird vocalization than the objective 
sonography inadequately applied (see, for example, Figs 7, 8, 9, 11 with the 
corresponding text and SzÓxE — Gunn — Firrp, 1969). 

In the following, various examples of bird vocalizations are represented by 
different methods in order to verify in practice the theoretical statements and 
ideas expounded so far. On some of the sonograms the traces of higher harmo- 
nics were eliminated (covered) so that they do not interfere in confrontation 
with other representations revealing only the fundamental frequency. 
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On the following pages and in the Figures some abbreviations will be used, 
namely: W for “wide band”, N for "narrow band" sonogram, F for “fundamen- 
tal (instantaneous) frequency graph”, G for “graphic”, S for “semigraphic” and 
AL for “musical” roprosontation.!? 

Fig. I. The W filter smears the frequency “lines” as though they were being 
painted with a wide brush (Davis 1964), so that it is not possible to read out the 
instantaneous (pitch-) frequency. This is also symbolized in the musical five-line 
staff with note heads unreadably large. 

Fig. 2. Musical semitone scale descending from C, to C, (with rounded off 
frequency values in G) played by the first author on a wind-instrument. The 


12 Of similar *semigraphic" character are the experimental folk music notations made by means of a computer in 
the Royal Institute of Tecnology, Stockholm. An example reproduced from Sundberg & Tjerlund, 1971: 


13 Explanations of signs used in the Figures: Time data on the left (e. g., 1.4 sec, 0.7 sete.) refer to the natural, i. e, 
not slowed down duration of the vocalization illustrated, Numerals with arrows: 2 0,3 1,4 1 above the clef mean that 
the natural pitch is 2, 3 or 4 octaves higher, respectiv ely, than notated, while 14, 2}, 3} below the clef mean that 
the pitch of the slowed down reproduction is 1, 2 or 3 octaves lower, respectively, than notated. Prolongation or shorte- 
ning of notes: f) means a slight extension, while J a slight shortening of value of the marked note. Numbers in squares: 
|16,; | 16,: | 82, | i [64] mean that the natural duration is stretched (i. e. the speed slowed down) 16, 32 or 64 times, respecti- 


vely. Metronome marking, for example = 60, indlcates the approximative tempo of the slowed down vocalization. 
i. e., in this example 60 quarter notes per minute. (Subjective illustrations in some Figures give data in “centiseconds”, 
To obtain standardized indicatlon in milliseconds please multiply the given numbers by 10.) 
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jected to the sound microscopy technique, a perceptual pattern results which is 
written down by an experienced scholar in the form of a graphic representation. 
In order to make an immediate auditory (pitch) imagination possible, the 
semigraphic notation combines the features of pure graphic representation 
with the advantages of musical five-line staff (in the case of both non-musical 
and musical structure of vocalization).1? 

Professionally made subjective transcriptions based on the necessary sound 
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and time aspects. The accuracy of the subjective modes of representation (1, 2, 
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corresponding text and SZÖKE — GUNN — Fup, 1969). 

In the following, various examples of bird vocalizations are represented by 
different methods in order to verify in practice the theoretical statements and 
ideas expounded so far. On some of the sonograms the traces of higher harmo- 
nics were eliminated (covered) so that they do not interfere in confrontation 
with other representations revealing only the fundamental frequency. 
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On the following pages and in the Figures some abbreviations will be used, 
namely: W for “wide band”, N for “narrow band” sonogram, F for “fundamen- 
tal (instantaneous) frequency graph”, G for “graphic”, S for “semigraphic” and 
M for “musical” representation.4 

Fig. I. The W fiiter smears the frequency “lines” as though they were being 
painted with a wide brush (Davis 1964), so that it is not possible to read out the 
instantaneous (pitch-) frequency. This is also symbolized in the musical five-line 
staff with note heads unreadably large. 

Fig. 2. Musical semitone scale descending from C, to C, (with rounded off 
frequency values in @) played by the first author on a wind-instrument. The 


12 Of similar “semigraphic” character are the experimenta) folk music notations made by means of a computer in 
the Royal Institute of Technology, Stockholm. An example reproduced from Sundberg & Tjerlund, 1971: 


^ cents 7 5 70 7۶ 20 عوى‎ 
9 - 
8 1 


E 


13 Explanations of signs used in the Figures: Time data on the left (e. g., 1.4 sec, 0.7 s etc.) refer to the natural, i. c. 
not slowed down duration of the y ocalization illustrated, Numerals with arrows: 21,34 ,4+ above the clef mean that 
the natural pitch is 2, 3 or 4 octaves higher, respectively, than notated, while 1 $, 2}, 3} below the clef mean that 
the pitch of the slowed down reproduction is 1, 2 or 3 octaves lower, respectively, than notated. Prolongation or ahorte- 
ning of notes: ( means a slight extension, while|) a slight shortening of value of the marked note. Numbers in squares: 

16,| | 32,| 64 | mean that the natural duration is stretched (i, e. the speed slowed down) 16, 32 or 64 times, respecti- 
vely. Metronome marking, for example = 60, indicates the approximative tempo of the slowed down vocalization. 
i. e., in this example 60 quarter notes per minute. (Subjective illustrations in some Figures give data in “centiseconds”. 
To obtain standardized indication in milliseconds please multiply the given numbers by 10.) 
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overlapping of consecutive tones is clearly seen in N but not, with one exception, 
in W. The overlapping is caused by ringing (i. e., decay transient of the N filter). 
In A the only tone F, in the middle of the scale, intentionally played shorter 
than the others, seems almost to touch the subsequent one: here the ringing 
effect covers almost the whole rest between the two tones. The real endings 
of the tones are marked by palo vertical transients (see Footnote 6). This clear 
non-avian example makes easier to understand the artifacts that may be caused 
by ringing under more complex circumstances and when combined with other 
effects in birds. 

Fig. 3. Repeated musical “horn motifs” as a portion of a song of the Great 
Tit (Parus major). F with its linear semitone (logarithmic frequency) calibra- 
tion refers clearly to the musical pitch structure (intonation contour) perceived 
when slowed down 32 times and illustrated in M. Whereas W masks the frequ- 
ency (pitch-) structure 

Fig. 4 W, F, M. Grey Warbler (Gerygone igata) song (recorded by K. & J. 
BiGwoob, New Zealand) of surprisingly folksong-like three-section form (sce 
in F, M and W) with short "introductory" part (<) and a recitative "riythm" 
(b). In contrast to the frequency-smearing effect of W, graph F (without the 
“introductory” part) displays clearly the fundamental frequency as a function of 
time. Note the convincing parallelism (analogy) between the objective F and 
the subjective AT. (In W the two initial tones of M are not recorded.) 

Fig. 5. Yellow breasted Tit (Petroica macrocephala) song of folksong-like one- 
section form (recorded by K. & J. Browoon, New Zealand). F (here with loga- 
rithmic semitone calibration) refers again to the musical perception of the song 
structure shown in M on the basis of: a 16 and 32 times slowed down playback. 
In F the pitch level of the song is not quite fixed, and in M is slightly higher 
than in fact.24 The time structure can be portrayed still more precisely if rep- 
resented graphically (or semigraphically). 

Fig. 6. Hermit Thrush (Hylocichla gutlata) song (recorded hy D.J. Borror, 
Ohio). Fand af show the musical micro-structure of the second part of the sony, 
set in frame on W. 

Fig. 7. The initial part of a song of the Wren Troglodytes troglodytes. Errone- 
ously the bird is regerded as one of the nıost famous avian “musicians” of 
Europe. Its song, however, is non-musical, for it consists only of slurring (glis- 
sando) sounds the continuously changing frequency (intonation contour) of 
which is also obscured by the large bands in W. In contrast, S (based on 32- 
and 64-fold time stretch) displays the non-musical perceptual pattern (i.e. 
the intonation contour involving the time structure) distinctly. The semigraphic 
mode of representation is utilisable by scientists both unaequainted and familiar 
with the music-reading. The trained note-readers, however, are able to decode 
the semigraphic (five- line staff) representation much better (even in the case of 
non-musical structure of bird vocalization), especially concerning the sounding 
forms of bird vocalization (slowed down). 


14 The general pitch lex ci (tessitura) of singing of birds often undergocs some slight and insignificant continuous 
changes (temporary small-sı ale decreasing or ineresing of frequency) without to he perceivable on the auditory level 
of birds (and of man as weil). As in general the slight frequency (pitch) level instability of this kind (as well as the 
simultaneously arising subliminal tenporal inequality of tones of “equal duration”, too) escape notice even at a speed 
slowed down greatly, so, in contrast to the objective frequency graphs, these latent and contingent changes as inessen- 
tial physical phenomena cannot and must not be visualized in the subjective graphic and musical illustrations. This, 
however, does not mean that the structural analogy between the physical and the psycho-plysiological processes of 
vocalization and hearing in birds is questionable. 
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Fig. 8. Redstart (Phoenicurus phoenicurus) song of non-musical structure. 
(In W the initial part of the long introductory tone has been accidentally cut 
off.) Here too the intonation structure remains unrecognizable in contrast to 
S offering a full and reliable picture of the pitch (frequency) and time pattern 
of the song. 


Fig. 9. One of the highly developed micromelodies of the Hermit Thrush 
(Hylocichla guttata) (recorded by W. W. H. Gunn, Canada). When slowed 
down greatly the detailed piteh and time structure can easily be revealed even 
by an unexperienced listener as a surprisingly “human-like” song form (M). 
The slow avian melody in M was resung by the first author and speeded up 
again 32 times to bo spectrographed in order to compare the new spectrogram 
W, with that of the original avian song W. It is no surprise that in W, the “hu- 
man-like” musical character of the tune disappeared completely and to our ears 
(and eyes) the song structure became unrecognizable again. The long vertical 
lines in W, represent buildup and decay transients. N is a narrow band va- 
riant of W,. It also distinguishes itself by long vertical transient lines. Howe- 
ver, as a consequence of ringing, in N these vertical transient effects grew 
misleadingly thick (extended rightwards on the sonogram). (The reverbation- 
like pale multiplication of markings on some narrow band sonograms, as 
in Figs 9N, 18N, is an eliminable sort of distortion caused by the Sona- 
graph.) Further on, the slow melody M was played by tho first author on a 
wind-instrument in order to obtain a maximally clear intonation contour, then 
this was speeded up again to about the natural pitch level and duration of 
tho original bird song, and spectrographed as well (N,). Comparison of W of 
the natural avian song with N, of this man-performed instrumental avian tune 
shows how in the latter the ringing smcars even the separate, not directly 
neighbouring tones of the same frequency into seemingly continuous long ho- 
rizontal traces. 

Fig. 10. Great Tit (Parus major) call structured musically. In W the long 
vertical lines caused by the buildup (onset) and decay transient responses of the 
Sonagraph accur at sufficient distances from each other not tocause too much 
ai-turbance (though here the width of bands also makes the fundamental- 
frequency structure rather unrecognizable). G and M show the musical pattern 
clearly. 

Even with wide band spectrograms insurmountable difficulties may arise 
from short sounds with rapid attacks resulting in (long verical) wide-band 
transient lines in the record. The more with narrow band spectrograms rapid 
transients and frequency modulated tones (e. g. vibratos) would present time- 
resolution problems that could hardly be overcome, as already mentioned in 
Section I. 

Fig. 11. Grasshopper Warbler (Locustella naevia). Song portion consisting 
(differently from that in Fig. 10) in a rapid succession of short discrete tones 
(about 1500 miero-motifs per min) with a dense series of extremely wide tran- 
sient responses by the Sonagram (the long vertical lines in W) which make the 
actual vocal pattern, shown clearly in S and M, totally unrecognizable. In N 
the vertical transient lines are considerably vhickened and smeared together by 
the ringing effect of the narrow-band filter. Thus N becomes still more crowded 
than W. (See Footnote 6.) 
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Fig. 12. Six different fractions ABCDEF of a continuous song of the Sky 
Lark ( Alauda arvensis). The comparison of mutually corresponding rhytnmic 
patterns A through F (in W and M) shows an unacceptable smearing effect of 
the bandwidth and the dense and long transient lines produced by the Sona- 
graph. W masks, M, however, discloses the rhythmic patterns of the song 
fractions. 

Fig. 13. Due to the dense groups of vertical transient lines and the large 
bandwidth in W it is impossible to recognize the characteristic primitive musi- 
cal rhythmic pattern of the song of the Grasshopper Sparrow (Ammodramus 
savannorum) (recorded by W. W. H. GUNN, Canada), though in this case W 
was made exceptionally at a speed slowed down 2 times. When stretched 64 
times (with the necessary careful technique), the rhythmic pattern is audible 
distinetly in all details represented in M. Note the almost perfect regular 
alternation of four-beat and five-heat micro-measures. (W starts at "s," in M.) 

Fig. 14. River Warbler ( Locustella fluviatilis). Song portion repeating rapidly 
(about 600 times per min) a longer and more complex musical micro-motif (S) 
of about 7 —8 msec duration with extremely crowded transient phenomena and 
wide frequency bands in W. The musical structure M of the song (with some 
short non-musical slurrings), although distinctly audible in all details at a 
tape speed reduced 32 or 64 times, is completely concealed in W. This is an 
expressive example of the different reliability of the representational methods 
shown in W, S and M. 

Fig. 15. Ortolan Bunting (Emberiza hortulana) song. To the “trills” and the 
rapid final warbling (vibrato Y in W) of the song the instantaneous-frequeney 
recorder responds (in F) with clear indication of instantaneous pitch-frequency 
level (though here not sufficiently expressive due to the excessive tvpographie 
reduction in size of the original graph), while W docs it with too wide bands 
caused by close suecession of vertical transient lines smeared together. Com- 
pare the four different representations 'V, V,, V,, Y,, of the warbling final 
tone. In F (if V, stretched still more than represented here) the rate of warbling, 
about 350 cycles of fluctuation per second, can easily be calculated due to the 
adjustability of the speed of film. Of course, independently of this rapidity the 
warbling can also be made audible in all details when slowed down 64 times. 

Special difficulties arise when frequency modulated tones (e. g. vibratos), so 
common in bird vocalizations, are to be spectrographed with narrow band 
filter. If the modulation frequency becomes greater than the filter width of the 
Sonagraph (wide band: usually 300 Hz, narrow band: 45 Hz), then the sono- 
gram no longer shows the vibrato, i. e. the periodic frequency fluctuation of the 
signal. Instead the signal, i. e. the warbling tone, is split up into several hori- 
zontal side bands running one above the other. The resolving of the frequency 
modulated final tone (V, V,, V3) into side bands is displayed, for example, in 
V, (Fig. 15). 

Fig. 16. Corn Bunting (Emberiza calandra) song. Behind the conspicuous 
vertical transient lines and wide frequeney bands in W a very complex, pre- 
dominantly musical, structure is hidden (displayed semigraphically in 5). 
This is an expressive example of how the wide band sonogram masks visually 
the “pitch” and the time structure of complex bird songs. Compare, for ins- 
tance, the five initial “flag-like” bands in W with the corresponding five initial 
micro-motifs (with corresponding numbering) of musical structure, composed 
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of several discrete short tones as shown in S, displaying distinctly their whole 
fine structure smeared in W. Existence of such well-formed and richly patterned 
bird songs, designating both the species and the individual, cannot be merely 
the acoustic result of latent (innate) myogenic or neurogenic processes, but, inthe 
last analvsis, it can, in some sense, only result from the social life of thespecies: 
(of course, on physical and psycho-physiological basis). This means that the 
pattern (intonation contour) of such complex songs must be audible to the 
birds in every structural detail. 

Fig. 17. The well known call of the Greenfinch (Carduelis chloris). W gives 
a sound picture blurred by the extreme density of long transient lines, prosen- 
ting striking evidence of heavy visual masking of frequency and time structure 
of bird calls consisting of rapid successions of discrete short tones (here about 
140 per second) with more or less definite pitch, as shown in G at a speed slowed 
down 64 times, representing only a short portion of the call. With its serics of 
discrete apparent transient lines another wide band sonogram W, made at a 
two-fold stretch of time, reveals the periodically interrupted (tremolo, buzz-like) 
time structure of the call, hidding at the same time the frequency (pitch) strue- 
ture by the wide (long) transient lines. By counting the apparent transient 
lines (in fact smearing together both the buildup and the decay effects) it may 
be found that the mean rate of sound bursts in the tremolo is about 160 per 
second, decreasing towards the final part of the call to about 145 bursts per 
sec. This agrees sufficiently with the result 140/s obtained through aural coun- 
ting of the sound bursts at a tape speed slowed down 64 times (G). In Y the 
series of vertical transient lines characteristic of such wide band sonograms 
becomes transformed into an extremely wide band of fluctuating and inter- 
locked long hor zoutal frequency lines lying closely on one another. Here both 
the time resolution and the frequency resolution are insufficient. 

Fig. 18. Snort (0.5 s) alarm call of the Great Tit (Parus major), musically 
structured and co taining throe warbling tones, i. e. vibratos (a,b, cor I, 2, 3). 
Graphic representation G of a short portion of the warbling tone a (=1) shows 
a warbling of 225 vibrato eveles per second which are audible distinctly if 
slowed down appropriately. N is distorted by side bands of the warbling tones 
1, 2, 3 as well as by other smearing effects (e. g. ringing). 

Here we have come to the end of the demonstration of our practical examples 
showing that the actual intonation patterns of bird vocalizations must be 
analyzed în an adequate way. Without athorough and demanding knowledge 
of bird vocalizations, without the knowledge of their pitch (frequency) and 
time structure, it is difficult, if not impossible, to study certain essential aspects 
of avian life sufficiently comprohensivelv and reliably. Moreover, the signifi- 
cance of ornithological acoustics (implying ornithomusicoiogy) for musicology 
in general, for the disclosure of the pre-human and, in general, presumably 
biological fundamentals of human music aesthetics, and for some other natural 
and social sciences, also calls for more adequate methods. These methods imply, 
of course, the necessity for a deeper study of birds’ auditory mechanisins, first 
of all their time discrimination. 

The main aim of this paper is not to demonstrate our experimental new repre- 
sentational methods. However, if confronted with the sonograms conventionally 
applied, these new methods seem to be suitable making the inadequacy of the 
sonograms in the applications mentioned evident. Further, we believe that, 
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in general, these methods (both objective and subjective) represent a promising 
direction in which the study of intonation structure of bird vocalizations (and 
in most cases of those of fishes, amphibia, reptilia and mammalia too) may be 
succesfully developed. 
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Summary 

The spectrographic method producing the well known sonograms has found 
considerably wide use in bio-acoustics. In certain, particularly in some ornithe- 
acoustical applications it must be borne in mind, however, thet it represents 
the spectrum density ase function of time and frec quency. The Dra statement 
that the Sonagraph performs the “freovenev/time” analysis is to be unders- 
in the sense that the result of the analysis is plottod in a coordinate system 
with a, frequeney axis end a time axis. The result cf the snoslvsis itself, i. e. 
the sonograin, is, strieiy speaking, a ^ specti um-donsity/time" crephie ropre- 
sentation, not e "irequenev/time" one. Lf we study, however that para- 
meter of acoustical stimuli which is perceived as pitch (varying or cons- 
tant, changing stepwise or continuously) or, in short, if it is the ۸ 
structure of avian vocalization that is to be investigated, then the only adequate 
inethod is the recording of instantaneous frequency defined in the time domain. 
This has long been recognized in experimental phonetics and in musical acous- 
tics but, with only a few exceptions, not in bio-acoustics. The paper aims at 
demonstrating that for the study of intonation (pitch and time structure) the 
svectrographie method is far from optimum becanse it is a linear method, 
subjected to the uncertainty principle. Particular difficulties are encountered 
with rapid successions of short sounds and in frequency modulated or periodi- 
cally interrupted tones. On the other hand, the time resolution of the instan- 
tancous-frequeney measurement is one cycle, and its frequency resolution 
depends only on tho accuracy of tho measuring s equipment aad on the siznal- 
-to-noise ratio. Tne instantaneous- frequency y graphs also confirm the validity 
and scientific value of the graphic, semigr: uphic and adapted musical repre- 
sentation based on sound microseopy, shown in the paper with corresponding 
sonograms. 


ZUSAMMENFASSUNG 


Die Anwendung der mittels der linearen Methode der Klauzspektrographi: produzierten 
Sonogramme ist in der Bioakustik allgemein verbreitet. Dabei wird aber die Tatsache außer acht 
gelassen, daß die Sonogramme nicht die Frequenz der 'Tóns, sondern deren Spektrumbreite (Spek- 
truindiclite, spectrum density) in der Funktion der Zeit darstellen. So repräsentiert das Sonogranim 
nicht den von den Vógeln (und auch vom Menschen) als Tonhóhe wahrgenommenen Parameter 
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der akustischen Stimuli, sondern einen anderen physikalischen Parameter, der aber im Tonhóhen- 
unterscheidungsvermögen der Vögel keine Rolle spielt. Die bioakustische Literatur behauptet, 
daß das Sonograph eine ,,Frequenz/Zeit“-Analyse produziert. In der Tat jedoch analysiert das 
Sonograph die ,,Spektrumdichte/Zeit“-Struktur, und das Sonogramm — das sich zwar im ,,Fre- 
quenz/Zeit“*-Koordinatensystem ausprágt — veranschaulicht eigentlich die ,,Spektrumdichte/ 
Zeit“-Struktur, nicht aber die „Frequenz/Zeit“-Struktur der Vogelstimme. Das bedeutet prak- 
tisch, daß das Sonogramm die „Intonationsgestalt“, d. h. die wirkliche ‚‚Tonhöhe/Zeit“-Struktur 
der Vogelstimmen nicht abbildet, sondern verdeckt. In unserem Aufsatze wird bewiesen, daß sich 
zur Darstellung der Intonationsgestalt die spektrographische Methode im Prinzip nicht eignet. 
Die Aufgabe daher ist nicht, diese Methode zu vervollständigen, sondern sie mit einer anderen — 
nicht-linearen, adäquaten und exakten — physikalischen Methode zu ersetzen, bei welcher das 
Prinzin der Unsicherheitsrelation (uncertainty principle) nicht zur Geltung kommt. Das Unsicher- 
heitsprinzip verursacht besonders bei der Analyse von rapiden Tonsukzessionen und Frequenz- 
modulationen unüberbrückbare Schwierigkeiten. 

Die einzige, prinzipiell adäquate Methode der Untersuchung der ,,Frequenz/Zeit“-Struktur, 
d. h. der Intonationsgestalt der Vogelstiinmen ist die Derstellung der momentanen Frequenz. 
Diese nicht-lineare objektive Methode bekräftigt auch die psychoakustische Adäquatheit der auf 
Grund der starken T'onverlangsamung sachgemäß hergestellten subjektiven (korrelativen) graphi- 
schen und biomusikalischen Abbildungen der Vogelstimmen. 
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Fig. 1. The frequeney (pitch) smearing effect cf wide bands 
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Fig. 4M. Grey Warbler, musical representation of Fig. 4 F(W) 
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Fig. 5. Yellow-breasted Tit song 


Fig. 6. Hermit ‘Thrush song 


10 143 


N er ee Fan 
7 14 sec N 


Fig. 7. Wren, non-musical song (initial part) 
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Tig. 8 Redstart, non-musical song 
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Fig. 9. Hermit Thrush, a folksong-like miero-melody 
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Fig. 10. Great Tit call of musical structure 
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Fig. 11. Grasshopper Warbler, song portion 
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Fig. 14. River Warbler, song portion 


Fig. 15. Ortolan Bunting song 
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Fig. 16. Corn Bunting song 
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17. Greenfineh call 
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Fig. 18. Great Tit, musically structured call with vibratos 


